Skip to content

FEAT Add BeaverTails dataset loader#1424

Open
romanlutz wants to merge 8 commits intoAzure:mainfrom
romanlutz:romanlutz/add-beaver-tails-dataset
Open

FEAT Add BeaverTails dataset loader#1424
romanlutz wants to merge 8 commits intoAzure:mainfrom
romanlutz:romanlutz/add-beaver-tails-dataset

Conversation

@romanlutz
Copy link
Contributor

Add remote dataset loader for BeaverTails (PKU-Alignment/BeaverTails), containing 330k+ QA pairs annotated across 14 harm categories for safety alignment research. Filters to unsafe entries by default.

Copilot AI review requested due to automatic review settings March 1, 2026 14:28
@romanlutz romanlutz force-pushed the romanlutz/add-beaver-tails-dataset branch from 7b635d9 to b652d70 Compare March 1, 2026 14:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new remote seed dataset loader for the BeaverTails HuggingFace dataset, making it discoverable via SeedDatasetProvider and documenting its availability.

Changes:

  • Introduces _BeaverTailsDataset remote loader with optional unsafe_only filtering (default: unsafe only).
  • Registers the loader in the remote datasets module and adds unit tests for filtering behavior.
  • Updates the “Loading Built-in Datasets” notebook output to include the new dataset name.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pyrit/datasets/seed_datasets/remote/beaver_tails_dataset.py New HuggingFace-backed loader that converts BeaverTails rows into SeedPrompts (unsafe-only by default).
pyrit/datasets/seed_datasets/remote/__init__.py Imports/exports the new loader so it’s auto-registered/discoverable.
tests/unit/datasets/test_beaver_tails_dataset.py Adds unit tests covering unsafe-only vs all-entries behavior and dataset naming.
doc/code/datasets/1_loading_datasets.ipynb Notebook updated to reflect the new dataset in the available list (but now includes executed outputs/metadata).

@romanlutz romanlutz force-pushed the romanlutz/add-beaver-tails-dataset branch 2 times, most recently from 9741ae3 to 1fd2ef7 Compare March 2, 2026 13:02
Copilot AI review requested due to automatic review settings March 2, 2026 13:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

Copilot AI review requested due to automatic review settings March 2, 2026 13:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Copilot AI review requested due to automatic review settings March 2, 2026 15:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

romanlutz and others added 7 commits March 2, 2026 13:48
Add remote dataset loader for BeaverTails (PKU-Alignment/BeaverTails), containing
330k+ QA pairs annotated across 14 harm categories for safety alignment research.
Filters to unsafe entries by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HF dataset identifier is now a class constant HF_DATASET_NAME
instead of a constructor parameter, consistent with other loaders.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For a 330k-row dataset, this avoids hundreds of thousands of
redundant string/list allocations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 21:50
@romanlutz romanlutz force-pushed the romanlutz/add-beaver-tails-dataset branch from 8a9dccb to a91052f Compare March 2, 2026 21:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +85 to +110
description = (
"BeaverTails is a collection of 330k+ human-LLM QA pairs annotated across 14 harm "
"categories, designed for safety alignment research. Introduced in 'BeaverTails: "
"Towards Improved Safety Alignment of LLM via a Human-Preference Dataset' (2023)."
)

source_url = f"https://huggingface.co/datasets/{self.HF_DATASET_NAME}"
groups = ["Institute for Artificial Intelligence", "CFCS, School of Computer Science"]

seed_prompts = []
for item in data:
if self.unsafe_only and item["is_safe"]:
continue

harm_categories = [k for k, v in item["category"].items() if v]

seed_prompts.append(
SeedPrompt(
value=f"{{% raw %}}{item['prompt']}{{% endraw %}}",
data_type="text",
dataset_name=self.dataset_name,
harm_categories=harm_categories,
description=description,
source=source_url,
authors=authors,
groups=groups,
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description/docstring emphasizes that BeaverTails contains QA pairs, but the loader currently only emits SeedPrompt values from item['prompt'] and ignores the associated response. To avoid misleading consumers, either (a) explicitly document that only the prompt column is extracted (similar to other dataset loaders), or (b) include the response in SeedPrompt.metadata (or a paired seed type if supported) so the QA relationship isn’t lost.

Copilot uses AI. Check for mistakes.
Comment on lines 195 to 201
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\romanlutz\\AppData\\Local\\Temp\\ipykernel_50620\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n",
"C:\\Users\\romanlutz\\AppData\\Local\\Temp\\ipykernel_50556\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n",
" memory.get_seeds(harm_categories=[\"illegal\"], is_objective=True)\n"
]
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook diff still includes captured runtime output with user/machine-specific absolute paths (e.g., C:\\Users\\...\\AppData\\Local\\Temp\\ipykernel_...). Please clear cell outputs (and any execution metadata) before committing so docs remain deterministic and don’t leak local environment details.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants